1,312 research outputs found

    Large-scale Hierarchical Alignment for Data-driven Text Rewriting

    Full text link
    We propose a simple unsupervised method for extracting pseudo-parallel monolingual sentence pairs from comparable corpora representative of two different text styles, such as news articles and scientific papers. Our approach does not require a seed parallel corpus, but instead relies solely on hierarchical search over pre-trained embeddings of documents and sentences. We demonstrate the effectiveness of our method through automatic and extrinsic evaluation on text simplification from the normal to the Simple Wikipedia. We show that pseudo-parallel sentences extracted with our method not only supplement existing parallel data, but can even lead to competitive performance on their own.Comment: RANLP 201

    A Latent Source Model for Nonparametric Time Series Classification

    Full text link
    For classifying time series, a nearest-neighbor approach is widely used in practice with performance often competitive with or better than more elaborate methods such as neural networks, decision trees, and support vector machines. We develop theoretical justification for the effectiveness of nearest-neighbor-like classification of time series. Our guiding hypothesis is that in many applications, such as forecasting which topics will become trends on Twitter, there aren't actually that many prototypical time series to begin with, relative to the number of time series we have access to, e.g., topics become trends on Twitter only in a few distinct manners whereas we can collect massive amounts of Twitter data. To operationalize this hypothesis, we propose a latent source model for time series, which naturally leads to a "weighted majority voting" classification rule that can be approximated by a nearest-neighbor classifier. We establish nonasymptotic performance guarantees of both weighted majority voting and nearest-neighbor classification under our model accounting for how much of the time series we observe and the model complexity. Experimental results on synthetic data show weighted majority voting achieving the same misclassification rate as nearest-neighbor classification while observing less of the time series. We then use weighted majority to forecast which news topics on Twitter become trends, where we are able to detect such "trending topics" in advance of Twitter 79% of the time, with a mean early advantage of 1 hour and 26 minutes, a true positive rate of 95%, and a false positive rate of 4%.Comment: Advances in Neural Information Processing Systems (NIPS 2013

    Unitary Positive-Energy Representations of Scalar Bilocal Quantum Fields

    Full text link
    The superselection sectors of two classes of scalar bilocal quantum fields in D>=4 dimensions are explicitly determined by working out the constraints imposed by unitarity. The resulting classification in terms of the dual of the respective gauge groups U(N) and O(N) confirms the expectations based on general results obtained in the framework of local nets in algebraic quantum field theory, but the approach using standard Lie algebra methods rather than abstract duality theory is complementary. The result indicates that one does not lose interesting models if one postulates the absence of scalar fields of dimension D-2 in models with global conformal invariance. Another remarkable outcome is the observation that, with an appropriate choice of the Hamiltonian, a Lie algebra embedded into the associative algebra of observables completely fixes the representation theory.Comment: 27 pages, v3: result improved by eliminating redundant assumptio

    New methods in conformal partial wave analysis

    Full text link
    We report on progress concerning the partial wave analysis of higher correlation functions in conformal quantum field theory.Comment: 16 page

    Survey of Ecological Characteristics of Boreal Tree Species in Fennoscandia and the USSR

    Get PDF
    The paper presents results from a literature study on autecological characteristics of North European and Asian boreal and boreo-nemoral tree species. It also provides general ecological information about the main forest types in the boreal region of the USSR and Fennoscandia. The work has been mainly done during the Young Scientist's Summer Program of 1988 and is a part of the Biosphere Dynamics Project activities. Species natural history data have been collected and assembled in such a way that they can be used in parameterization and modification of existing (or new-formulated) mixed-species forest stand simulators (e.g., gap models). The ecological survey involves 27 tree species divided into two groups. The first one, called "dominant tree species", includes 13 major forest-forming species of the present-day boreal forests of the USSR and Fennoscandia, while the second one, "important species", contains species which either dominate forests at the boreal-border areas (i.e. boreo-nemoral forests) or have restricted distribution within the boreal zone. Each species is attempted to be characterized as completely as possible by the following categories: systematics (scientific name, author and synonymies), spatial distribution (description and maps of continuous range of natural growth), habitat requirements (climate, soil types, associated species, and forest types), life history (reproduction and growth), response to environmental factors (light, soil moisture, nutrients, frost, permafrost, fire, windstorm, flooding and poludification), races and hybrids, enemies and diseases. The data from the autecological reviews are summarized as 24 input model parameters in the Appendix. The paper should be considered as a first step in building a boreal tree species natural history database to be used with simulation models. It is also the first attempt to compile autecological data about North Asian tree species for modeling purposes

    Character-level Chinese-English Translation through ASCII Encoding

    Full text link
    Character-level Neural Machine Translation (NMT) models have recently achieved impressive results on many language pairs. They mainly do well for Indo-European language pairs, where the languages share the same writing system. However, for translating between Chinese and English, the gap between the two different writing systems poses a major challenge because of a lack of systematic correspondence between the individual linguistic units. In this paper, we enable character-level NMT for Chinese, by breaking down Chinese characters into linguistic units similar to that of Indo-European languages. We use the Wubi encoding scheme, which preserves the original shape and semantic information of the characters, while also being reversible. We show promising results from training Wubi-based models on the character- and subword-level with recurrent as well as convolutional models.Comment: 7 pages, 3 figures, 3rd Conference on Machine Translation (WMT18), 201

    Infinite dimensional Lie algebras in 4D conformal quantum field theory

    Full text link
    The concept of global conformal invariance (GCI) opens the way of applying algebraic techniques, developed in the context of 2-dimensional chiral conformal field theory, to a higher (even) dimensional space-time. In particular, a system of GCI scalar fields of conformal dimension two gives rise to a Lie algebra of harmonic bilocal fields, V_m(x,y), where the m span a finite dimensional real matrix algebra M closed under transposition. The associative algebra M is irreducible iff its commutant M' coincides with one of the three real division rings. The Lie algebra of (the modes of) the bilocal fields is in each case an infinite dimensional Lie algebra: a central extension of sp(infty,R) corresponding to the field R of reals, of u(infty,infty) associated to the field C of complex numbers, and of so*(4 infty) related to the algebra H of quaternions. They give rise to quantum field theory models with superselection sectors governed by the (global) gauge groups O(N), U(N), and U(N,H)=Sp(2N), respectively.Comment: 16 pages, with minor improvements as to appear in J. Phys.

    Jacobi Identity for Vertex Algebras in Higher Dimensions

    Full text link
    Vertex algebras in higher dimensions provide an algebraic framework for investigating axiomatic quantum field theory with global conformal invariance. We develop further the theory of such vertex algebras by introducing formal calculus techniques and investigating the notion of polylocal fields. We derive a Jacobi identity which together with the vacuum axiom can be taken as an equivalent definition of vertex algebra.Comment: 35 pages, references adde
    • …
    corecore